AITopics | test program

Collaborating Authors

test program

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

30d411fdc0e6daf092a74354094359bb-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 08:47:31 GMT

artificial intelligence, machine learning, pe solution, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

238c98450b1d9e8055f94d22f303bb57-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 02:08:05 GMT

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)

Add feedback

Impact of Data-Oriented and Object-Oriented Design on Performance and Cache Utilization with Artificial Intelligence Algorithms in Multi-Threaded CPUs

Arantes, Gabriel M., Pinto, Richard F., Dalmazo, Bruno L., Borges, Eduardo N., Lucca, Giancarlo, de Mattos, Viviane L. D., Cardoso, Fabian C., Berri, Rafael A.

arXiv.org Artificial IntelligenceDec-10-2025

This study provides a comprehensive performance analysis of Data-Oriented Design (DOD) versus the traditional Object-Oriented Design (OOD), focusing on cache utilization and efficiency in multi-threaded environments. We developed and compared four distinct versions of the A* search algorithm: single-threaded OOD (ST -OOD), single-threaded DOD (ST -DOD), multi-threaded OOD (MT -OOD), and multi-threaded DOD (MT -DOD). The evaluation was based on metrics including execution time, memory usage, and CPU cache misses. In multi-threaded tests, the DOD implementation demonstrated considerable performance gains, with faster execution times and a lower number of raw system calls and cache misses. While OOD occasionally showed marginal advantages in memory usage or percentage-based cache miss rates, DOD's efficiency in data-intensive operations was more evident. Furthermore, our findings reveal that for a fine-grained task like the A* algorithm, the overhead associated with thread management led to single-threaded versions significantly outperforming their multi-threaded counterparts in both paradigms. We conclude that even when performance differences appear subtle in simple algorithms, the consistent advantages of DOD in critical metrics highlight its foundational architectural superiority, suggesting it is a more effective approach for maximizing hardware efficiency in complex, large-scale AI and parallel computing tasks.

cache miss, object-oriented architecture, programming language, (17 more...)

arXiv.org Artificial Intelligence

2512.07841

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.66)

Industry: Government > Military (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)

Add feedback

iPanda: An LLM-based Agent for Automated Conformance Testing of Communication Protocols

Sun, Xikai, Dang, Fan, Jiang, Shiqi, Xu, Jingao, Liu, Kebin, Miao, Xin, Yang, Zihao, Zhang, Weichen, Lu, Haimo, Zheng, Yawen, Liu, Yunhao

arXiv.org Artificial IntelligenceJul-30-2025

Conformance testing is essential for ensuring that protocol implementations comply with their specifications. However, traditional testing approaches involve manually creating numerous test cases and scripts, making the process labor-intensive and inefficient. Recently, Large Language Models (LLMs) have demonstrated impressive text comprehension and code generation abilities, providing promising opportunities for automation. In this paper, we propose iPanda, the first framework that leverages LLMs to automate protocol conformance testing. Given a protocol specification document and its implementation, iPanda first employs a keyword-based method to automatically generate comprehensive test cases. Then, it utilizes retrieval-augmented generation and customized CoT strategy to effectively interpret the implementation and produce executable test programs. To further enhance programs' quality, iPanda incorporates an iterative optimization mechanism to refine generated test scripts interactively. Finally, by executing and analyzing the generated tests, iPanda systematically verifies compliance between implementations and protocol specifications. Comprehensive experiments on various protocols show that iPanda significantly outperforms pure LLM-based approaches, improving the success rate (Pass@1) of test-program generation by factors ranging from 4.675 times to 10.751 times.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.00378

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution

Sun, Simeng, Hsieh, Cheng-Ping, Ladhak, Faisal, Arakelyan, Erik, Serano, Santiago Akle, Ginsburg, Boris

arXiv.org Artificial IntelligenceApr-14-2025

Complex reasoning tasks often rely on the ability to consistently and accurately apply simple rules across incremental steps, a foundational capability which we term "level-0" reasoning. To systematically evaluate this capability, we introduce L0-Bench, a language model benchmark for testing procedural correctness -- the ability to generate correct reasoning processes, complementing existing benchmarks that primarily focus on outcome correctness. Given synthetic Python functions with simple operations, L0-Bench grades models on their ability to generate step-by-step, error-free execution traces. The synthetic nature of L0-Bench enables systematic and scalable generation of test programs along various axes (e.g., number of trace steps). We evaluate a diverse array of recent closed-source and open-weight models on a baseline test set. All models exhibit degradation as the number of target trace steps increases, while larger models and reasoning-enhanced models better maintain correctness over multiple steps. Additionally, we use L0-Bench to explore test-time scaling along three dimensions: input context length, number of solutions for majority voting, and inference steps. Our results suggest substantial room to improve "level-0" reasoning and potential directions to build more reliable reasoning systems.

large language model, machine learning, qwen2, (20 more...)

arXiv.org Artificial Intelligence

2503.22832

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
(2 more...)

Add feedback

LLM-Vectorizer: LLM-based Verified Loop Vectorizer

Taneja, Jubi, Laird, Avery, Yan, Cong, Musuvathi, Madan, Lahiri, Shuvendu K.

arXiv.org Artificial IntelligenceJun-7-2024

Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.

compiler, llm-vectorizer, vectorized code, (16 more...)

arXiv.org Artificial Intelligence

2406.04693

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

White-box Compiler Fuzzing Empowered by Large Language Models

Yang, Chenyuan, Deng, Yinlin, Lu, Runyu, Yao, Jiayi, Liu, Jiawei, Jabbarvand, Reyhaneh, Zhang, Lingming

arXiv.org Artificial IntelligenceOct-24-2023

Compiler correctness is crucial, as miscompilation falsifying the program behaviors can lead to serious consequences. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates tests without sufficient understanding of internal compiler behaviors. As such, they often fail to construct programs to exercise conditions of intricate optimizations. Meanwhile, traditional white-box techniques are computationally inapplicable to the giant codebase of compilers. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization. WhiteFox adopts a dual-model framework: (i) an analysis LLM examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) a generation LLM produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are used as feedback to further enhance the test generation on the fly. Our evaluation on four popular compilers shows that WhiteFox can generate high-quality tests to exercise deep optimizations requiring intricate conditions, practicing up to 80 more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80 confirmed as previously unknown and 51 already fixed. Beyond compiler testing, WhiteFox can also be adapted for white-box fuzzing of other complex, real-world software systems in general.

compiler, optimization, whitefox, (14 more...)

arXiv.org Artificial Intelligence

2310.15991

Country:

North America > United States > Illinois > Champaign County > Urbana (0.05)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

eBPF-based Working Set Size Estimation in Memory Management

Lian, Zhilu, Li, Yangzi, Chen, Zhixiang, Shan, Shiwen, Han, Baoxin, Su, Yuxin

arXiv.org Artificial IntelligenceJan-16-2023

Working set size estimation (WSS) is of great significance to improve the efficiency of program executing and memory arrangement in modern operating systems. Previous work proposed several methods to estimate WSS, including self-balloning, Zballoning and so on. However, these methods which are based on virtual machine usually cause a large overhead. Thus, using those methods to estimate WSS is impractical. In this paper, we propose a novel framework to efficiently estimate WSS with eBPF (extended Berkeley Packet Filter), a cutting-edge technology which monitors and filters data by being attached to the kernel. With an eBPF program pinned into the kernel, we get the times of page fault and other information of memory allocation. Moreover, we collect WSS via vanilla tool to train a predictive model to complete estimation work with LightGBM, a useful tool which performs well on generating decision trees over continuous value. The experimental results illustrate that our framework can estimate WSS precisely with 98.5\% reduction in overhead compared to traditional methods.

artificial intelligence, ebpf program, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICSS55994.2022.00036

2303.05919

Country:

Asia > China > Guangdong Province > Zhuhai (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Combine Per-Example Solutions for Neural Program Synthesis

Shrivastava, Disha, Larochelle, Hugo, Tarlow, Daniel

arXiv.org Artificial IntelligenceNov-1-2021

The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on a multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same time budget, our technique significantly improves the success rate over PCCoder [32] and other ablation baselines.

logic & formal reasoning, machine learning, pe solution, (21 more...)

arXiv.org Artificial Intelligence

2106.07175

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.62)

Add feedback